Repetition and Median Filtering

In this section, we will learn about source separation approaches that exploit a common feature of musical signals: repetition. In doing so, we will gain some understanding of the mechanics of source separation and how the an algorithm can assumptions about a signal to separate

In this section, we will explore three algorithms that attempt to separate a repeating background from a non-repeating foreground. The basic assumption here is 1) that there is repetition in the mixture, and 2) the repetition captures what we want to separate. This assumption holds quite well if we want to separate a singer from a backing band, but might not work if we want to isolate a drum set from the rest of the band because the drum set is usually playing a repeating pattern. We will see that two of these ways to separate out repetition leverages the median operation, which is

We will also explore a related method (HPSS) that uses similar techniques to

For more details about these algorithms

REPET

The first algorithm we will explore here is called the REpeating Patern Extraction Technique or REPET {cite}. REPET works like this:

  1. Find a repeating period, \(t_r\) seconds (e.g., the number of seconds which a chord progression might start over).

  2. Segment the spectrogram into \(N\) segments, each with \(t_r\) seconds in length.

  3. “Overlay” those \(N\) segments.

  4. Take the median of those \(N\) stacked segments and make a mask of the median values.

We’ll use REPET to demonstrate how to run a source separation algorithm in nussl.

# Do our imports
import nussl
import matplotlib.pyplot as plt

Let’s download an audio file that has a lot of repetition in it, and inspect and listen to it:

audio_path = nussl.efz_utils.download_audio_file('historyrepeating_7olLrex.wav')
history = nussl.AudioSignal(audio_path)
history.embed_audio()

plt.figure(figsize=(10, 3))
nussl.utils.visualize_spectrogram(history)
plt.title(str(history))
plt.tight_layout()
plt.show()
Saving file at /home/runner/.nussl/audio/historyrepeating_7olLrex.wav
Downloading historyrepeating_7olLrex.wav from http://nussl.ci.northwestern.edu/static/audio/historyrepeating_7olLrex.wav

historyrepeating_7olLrex.wav...0%
historyrepeating_7olLrex.wav...0%
historyrepeating_7olLrex.wav...1%
historyrepeating_7olLrex.wav...2%
historyrepeating_7olLrex.wav...2%
historyrepeating_7olLrex.wav...3%
historyrepeating_7olLrex.wav...4%
historyrepeating_7olLrex.wav...5%
historyrepeating_7olLrex.wav...5%
historyrepeating_7olLrex.wav...6%
historyrepeating_7olLrex.wav...7%
historyrepeating_7olLrex.wav...8%
historyrepeating_7olLrex.wav...8%
historyrepeating_7olLrex.wav...9%
historyrepeating_7olLrex.wav...10%
historyrepeating_7olLrex.wav...10%
historyrepeating_7olLrex.wav...11%
historyrepeating_7olLrex.wav...12%
historyrepeating_7olLrex.wav...13%
historyrepeating_7olLrex.wav...13%
historyrepeating_7olLrex.wav...14%
historyrepeating_7olLrex.wav...15%
historyrepeating_7olLrex.wav...16%
historyrepeating_7olLrex.wav...16%
historyrepeating_7olLrex.wav...17%
historyrepeating_7olLrex.wav...18%
historyrepeating_7olLrex.wav...18%
historyrepeating_7olLrex.wav...19%
historyrepeating_7olLrex.wav...20%
historyrepeating_7olLrex.wav...21%
historyrepeating_7olLrex.wav...21%
historyrepeating_7olLrex.wav...22%
historyrepeating_7olLrex.wav...23%
historyrepeating_7olLrex.wav...24%
historyrepeating_7olLrex.wav...24%
historyrepeating_7olLrex.wav...25%
historyrepeating_7olLrex.wav...26%
historyrepeating_7olLrex.wav...26%
historyrepeating_7olLrex.wav...27%
historyrepeating_7olLrex.wav...28%
historyrepeating_7olLrex.wav...29%
historyrepeating_7olLrex.wav...29%
historyrepeating_7olLrex.wav...30%
historyrepeating_7olLrex.wav...31%
historyrepeating_7olLrex.wav...32%
historyrepeating_7olLrex.wav...32%
historyrepeating_7olLrex.wav...33%
historyrepeating_7olLrex.wav...34%
historyrepeating_7olLrex.wav...34%
historyrepeating_7olLrex.wav...35%
historyrepeating_7olLrex.wav...36%
historyrepeating_7olLrex.wav...37%
historyrepeating_7olLrex.wav...37%
historyrepeating_7olLrex.wav...38%
historyrepeating_7olLrex.wav...39%
historyrepeating_7olLrex.wav...40%
historyrepeating_7olLrex.wav...40%
historyrepeating_7olLrex.wav...41%
historyrepeating_7olLrex.wav...42%
historyrepeating_7olLrex.wav...42%
historyrepeating_7olLrex.wav...43%
historyrepeating_7olLrex.wav...44%
historyrepeating_7olLrex.wav...45%
historyrepeating_7olLrex.wav...45%
historyrepeating_7olLrex.wav...46%
historyrepeating_7olLrex.wav...47%
historyrepeating_7olLrex.wav...48%
historyrepeating_7olLrex.wav...48%
historyrepeating_7olLrex.wav...49%
historyrepeating_7olLrex.wav...50%
historyrepeating_7olLrex.wav...50%
historyrepeating_7olLrex.wav...51%
historyrepeating_7olLrex.wav...52%
historyrepeating_7olLrex.wav...53%
historyrepeating_7olLrex.wav...53%
historyrepeating_7olLrex.wav...54%
historyrepeating_7olLrex.wav...55%
historyrepeating_7olLrex.wav...56%
historyrepeating_7olLrex.wav...56%
historyrepeating_7olLrex.wav...57%
historyrepeating_7olLrex.wav...58%
historyrepeating_7olLrex.wav...58%
historyrepeating_7olLrex.wav...59%
historyrepeating_7olLrex.wav...60%
historyrepeating_7olLrex.wav...61%
historyrepeating_7olLrex.wav...61%
historyrepeating_7olLrex.wav...62%
historyrepeating_7olLrex.wav...63%
historyrepeating_7olLrex.wav...64%
historyrepeating_7olLrex.wav...64%
historyrepeating_7olLrex.wav...65%
historyrepeating_7olLrex.wav...66%
historyrepeating_7olLrex.wav...66%
historyrepeating_7olLrex.wav...67%
historyrepeating_7olLrex.wav...68%
historyrepeating_7olLrex.wav...69%
historyrepeating_7olLrex.wav...69%
historyrepeating_7olLrex.wav...70%
historyrepeating_7olLrex.wav...71%
historyrepeating_7olLrex.wav...72%
historyrepeating_7olLrex.wav...72%
historyrepeating_7olLrex.wav...73%
historyrepeating_7olLrex.wav...74%
historyrepeating_7olLrex.wav...74%
historyrepeating_7olLrex.wav...75%
historyrepeating_7olLrex.wav...76%
historyrepeating_7olLrex.wav...77%
historyrepeating_7olLrex.wav...77%
historyrepeating_7olLrex.wav...78%
historyrepeating_7olLrex.wav...79%
historyrepeating_7olLrex.wav...80%
historyrepeating_7olLrex.wav...80%
historyrepeating_7olLrex.wav...81%
historyrepeating_7olLrex.wav...82%
historyrepeating_7olLrex.wav...82%
historyrepeating_7olLrex.wav...83%
historyrepeating_7olLrex.wav...84%
historyrepeating_7olLrex.wav...85%
historyrepeating_7olLrex.wav...85%
historyrepeating_7olLrex.wav...86%
historyrepeating_7olLrex.wav...87%
historyrepeating_7olLrex.wav...88%
historyrepeating_7olLrex.wav...88%
historyrepeating_7olLrex.wav...89%
historyrepeating_7olLrex.wav...90%
historyrepeating_7olLrex.wav...90%
historyrepeating_7olLrex.wav...91%
historyrepeating_7olLrex.wav...92%
historyrepeating_7olLrex.wav...93%
historyrepeating_7olLrex.wav...93%
historyrepeating_7olLrex.wav...94%
historyrepeating_7olLrex.wav...95%
historyrepeating_7olLrex.wav...96%
historyrepeating_7olLrex.wav...96%
historyrepeating_7olLrex.wav...97%
historyrepeating_7olLrex.wav...98%
historyrepeating_7olLrex.wav...98%
historyrepeating_7olLrex.wav...99%
historyrepeating_7olLrex.wav...100%

../../_images/repetition_4_141.png

Now we need to instantiate a Repet object in nussl. We can do that like so:

repet = nussl.separation.primitive.Repet(history)

Now the repet object has our AudioSignal, it’s easy to run the algorithm:

repet.run()
[<nussl.core.masks.soft_mask.SoftMask at 0x7f06e49cdad0>,
 <nussl.core.masks.soft_mask.SoftMask at 0x7f06e21aaed0>]

Oh, look! The repet object returned masks! We can get audio signals back by doing the following:

r_estimates = repet.make_audio_signals()

We can also chain both of those operations if we don’t care about the intermediate steps:

r_estimates = repet()

Let’s check out the masks that repet made:

def visualize_and_embed(sources):
    """A helper function to make some nice graphs"""
    sources = {
        'Background': sources[0],
        'Foreground': sources[1]
    }
    plt.figure(figsize=(10, 7))
    plt.subplot(211)
    nussl.utils.visualize_sources_as_masks(
        sources, db_cutoff=-60, y_axis='mel')
    plt.subplot(212)
    nussl.utils.visualize_sources_as_waveform(
        sources, show_legend=False)
    plt.tight_layout()
    plt.show()

    nussl.play_utils.multitrack(sources, ext='.wav')
visualize_and_embed(r_estimates)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)
/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/librosa/display.py:974: MatplotlibDeprecationWarning: The 'linthreshy' parameter of __init__() has been renamed 'linthresh' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  scaler(mode, **kwargs)
../../_images/repetition_15_1.png

And there are our foreground and background sources!

The process of running a separation algorithm in nussl was only two steps:

  1. Instantiate a separation object with an audio signal. E.g., repet = nussl.separation.primitive.Repet(history)

  2. Run the object to get the results. E.g. repet()

Now let’s look at a few other algorithms that leverage repetition in a musical recording and compare results to REPET.

REPET-SIM

REPET-SIM is a variant of REPET that doesn’t rely on a fixed repeating period. In fact, it doesn’t rely on repetition as explicitly as REPET does. REPET-SIM calculates a similarity matrix between each pair of spectral frames in an STFT, selects the \(k\) nearest nieghbors for each frame, and makes a mask by median filtering the bins for each of the selected neighbors.

We can run REPET-SIM the same way we can run REPET:

repet_sim = nussl.separation.primitive.RepetSim(history)
rs_estimates = repet_sim()

visualize_and_embed(rs_estimates)
../../_images/repetition_17_0.png

2DFT

We can also use a Two-dimensional Fourier Transform (2DFT) of a spectrogram to find repeating and non-repeating patterns. Repeating sections show up as peaks in the 2DFT and non-repeating parts are everything else. We can use a peak picker to separate the repeating from non repeating parts. That’s what this algorithm does:

# We can't start a variable name with a number,
# so this object is called FT2D
ft2d = nussl.separation.primitive.FT2D(history)
ft2d_estimates = ft2d()
visualize_and_embed(ft2d_estimates)
../../_images/repetition_19_0.png

Harmonic-Percussive Source Separation (HPSS)

If you spend enough time visualizing musical signals on a spectrogram, you start to notice that harmonic sounds look similar horizontal stripes on a spectrogram and percussive sounds look similar to vertical stripes. Harmonic-Percussive Source Separation takes advantage of this insight by applying a median filter accross frequency bins (horizontal, or harmonic) and across time bins (vertical, or percussive) to make a mask:

hpss = nussl.separation.primitive.HPSS(history)
hpss_estimates = hpss()[::-1]
# hpss gives harmonic then percussive
# so let's reverse the order of the list
visualize_and_embed(hpss_estimates)
../../_images/repetition_21_0.png

Next Steps…

There you have it. Four simple algorithms to separate repeating and non-repeating parts and also harmonic and percussive parts.

Next we’ll talk about how we can model timbre using Non-negative Matrix Factorization.